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Abstract 

Belief propagation is known to perform extremely well in many practical statistical infer- 
ence and learning problems using graphical models, even in the presence of multiple loops. 
The use of the belief propagation algorithm on graphical models with loops is referred to 
as Loopy Belief Propagation (LBP). Various sufficient conditions for convergence of LBP 
have been presented; however, general necessary conditions for its convergence to a unique 
fixed point remain unknown. Because the approximation of beliefs to true marginal prob- 
abilities has been shown to relate to the convergence of LBP, several methods have been 
explored whose aim is to obtain distance bounds on beliefs when LBP fails to converge. 
In this paper, we derive uniform and non-uniform error bounds on LBP, which are tighter 
than existing ones in literature, and use these bounds to study the dynamic behavior of the 
sum-product algorithm. We subsequently use these bounds to derive sufficient conditions 
for the convergence of the sum-product algorithm, and analyze the relation between con- 
vergence of LBP and sparsity and walk-summability of graphical models. We finally use 
the bounds derived to investigate the accuracy of LBP, as well as the scheduling priority 
in asynchronous LBP. 

Keywords: Graphical Model, Bayesian Networks, Markov Random Fields, Loopy Belief 
Propagation, Error Analysis. 

1. Introduction 

Probabilistic inference for large-scale multivariate random variables is very expensive com- 
putationally. Belief propagation (BP) algorithms are designed to reduce the computational 
burden by exploiting the factoriz ation of j oint density function s captured by the topolog i 



... , g i- 

cal structure of graphica l mod els [Bishopl rt200fih : I Jordan! (jl999l k iKschischang et all (j20mh : 



Wainwright and Jordan! (|2008l )]. BP is known to converge to the exact inference on acyclic 
graphs (i.e. trees) or graphs that contain a single loop. In the case of graphs with multiple 
loops, BP results in an iterative method referred to as loopy belief propagation (LBP). The 
use of LBP generally provides remarkably good approximations in real- wor l d app lications; 
e.g., turbo decoding and stereo matching [Mceliece et al.l (|1998l ); ISun et al.l (|2003h ]. 

Because LBP does not always converge, sufficient conditions fo r its convergence have 
been extensively investigated in the past using various approaches Tatikonda and Jordan 



I 



(j2002h : lHeskel (j2004h : llhler et al.l (|2005h : iMooii and Kappen] (120071)1. Necessary co nditions 
for convergence of LBP, however, remain unknown. Tatikonda and Jordan ( 20021 ) related 
convergence of LBP to the uniqueness of a sequence of Gibbs measures defined on the 
associated computation tree. He subsequently develop ed a tes t able s uffi cient co n dition 
for convergence of LBP by applying Simon's condition [Georgii ( 19881 )]. Heskes ( 2004 ) 



Georgiil (|198£ 

presented sufficient conditions for uniqueness of fixed points in LBP by relying on the 
uniqueness of minima of the Bethe free energy. He related the strength of the potentials 
with the convergence of the LBP algorithm, which leads to better sufficient conditions than 
those exclusively relying on the structure of the graph. 

Recently, several papers have in vestigated the me ssage updating functions of the LBP 
algorithm as contractive mappings, llhler et al.l (l2005h analyzed the contractive dynamics of 
message-error propagation in belief networks using dynamic-range measure as a metric, and 
obtained error bou n ds an d sufficient conditions for convergence of LBP message passing. 



Mooij and Kappen (j2007l ) derived sufficient conditions for convergence of LBP based on 



quotient norms of contractive mappings, which are invariant to scaling and shown to be 
valid for potential functions containing zeros. 

For Gaussian graphical models, Malioutov et al. ( 20061 ) related the convergence of means 
and variances to walk sums and defined walk-summability with respect to spectral radius o f 
partial correlation coefficient matrix. For binary graphs, Watanabe and Fukumizu ( 20091 ) 
presented an edge zeta function based on weighted prime cycles, and related convexity 
of Bethe free energy with the determinant formula of edge zeta function. They showed 
similar walk-summability of binary graphs by relating the spectra of co rrelation coefficient 
matrix with Hessian of Bethe free energy. For general graphical models, IMooii and Kappen 
(|2007l ) derived certain interaction coefficients between random variables based on strength of 
potential functions, and related the spectral radius of coefficient matrix with the convergence 
of LBP. Enlightened by those similar analysis, we defined walk-summable for general graphs 
and compared walk-summability with other existing convergence conditions. 

Although the beliefs may not be true marginal probabilities whe n the LBP a lgorithm 
converges, they have been shown to provide good approximations by Weissl ( 2000l ). When 
the LBP algorithm does not converge, however, beliefs are not good approximations of 
true marginals because the Bethe free energy d o es no t provide a good approximation of 
the Gibbs-Helmholtz free energy Yedidia et al.l (|2004l )]. 
LBP algorithm has consequently gained interest in recent years, 
rived bounds on exact marginals by relying on the girth of the graph (i.e. the number of 
edges in the shortest cyc l e in th e gr aph) and the properties of Dobrushin's interdependence 



Exactnes s and accu r acy o f the 
Tatikondal (j2003h de- 



matrix [Salas and Sokall (jl997l )]. iTaga and Masd (l2006aP) used D obrushin's theorem to 
present a distance bound on the marginal probabilities. Ilhlerl (120071 ) introduced a distance 
bound on the error between beliefs and marginals based on recent results for computing 
margi nal probabili ties for pairwise Marko v random fields using Self-Avoiding Walk (SAW) 
trees IWeitzl (l2006h ]. iMooij and Kappenl propagate bounds on marginal probabilities over 



a subtree or the SAW tree of the factor graph, and demonstrate that their bounds perform 
well in terms of accuracy and computation time of LBP. 

Several investigators hav e explored the consequence of scheduling on the convergence of 
BP. Taga and Mase ( 2006bl ) discussed the impatient and lazy belief propagati on algorithms 



and showed that the former is expected to converge faster than the latter. lElidan et al 
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(j2006h proposed a residual belief propagation algorithm, which schedules messages in an in- 
formed man ner thus signi f icantl y reducin g the running time needed fo r convergence of LBP. 
Inspired by Elidan et al. ( 20061 ) 's work, Sutton and Mccallum ( 2007 ) further increased the 
rate of convergence by estimating the residual rather than computing it directly. 

In this paper, we derive tight error bounds on LBP and use these bounds to study the 
dynamics — error, convergence, accuracy, and scheduling — of the sum-product algorithm^ 
Specifically, in Section [2] and Section [3j we rely on the contractive mapping property of 
message errors to present novel uniform and non-uniform distance bounds between multiple 
fixed-point solutions. Several graphical networks are investigated and used to demonstrate 
that the proposed distance bounds are tighter than existing bounds. We subsequently use 
these bounds to derive uniform and non-uniform sufficient conditions for convergence of the 
sum-product algorithm. Moreover, in Section [4j we analyze the relation between conver- 
gence and sparsity of graphs, and extend the convergence perspective of walk-summability 
from Gaussian graphical models to general graphical models. In Section [5l we present 
bounds on the distance between beliefs and true marginals by applying SAW trees and 
show that the proposed bounds can be used to improve existing bounds. Furthermore, in 
Section [U we explore the use of the upper-bound on message errors as a criterion to rank 
the priority of message passing for scheduling in asynchronous LBP. We then present a case 
study of LBP by studying its dynamics on completely uniform graphs and analyzing its true 
fixed points and message-error functions in Section [JJ We conclude the paper in Section [HJ 



2. Message-Error Propagation for the Sum-Product Algorithm 

Belief propagation originated from exact inference on tree structured graphical models, 
though for graphs with loops it shows remarkable performance of approximate inference. 
BP is synonymously called sum-product algorithm for marginalization of global distribution 
or max-product algorithm to compute Maximum- A-Posteriori (MAP). In this paper, we will 
mainly talk about sum-product algorithm for graphs with loops. 

2.1 Loopy Belief Propagation Updates 

Let us consider a general graphical model G = (V, E) whose distribution factors as follows: 

P(X) = — Y[ Tpst(Xs,Xt)Yl^s(Xs), (1) 

(s,t)eE sgV 

where Z is a normalization factor, ip s t{x s ,xt) is the pairwise potential function between 
random variables x s and xj, and ip s (x s ) is the single node potential function on x s . (s,t) 
denotes an undirected edge, V is the set of nodes, and E is the set of edges. We assume 
that all the potential functions are positive. 

Fig. QJa) illustrates the message passing mechanism used in BP. The updating rule of 
the sum-product algorithm for the message sent by node t to its neighbor node s at iteration 

1. A pre liminary version of some of the error bounds presented in this paper has appeared in Sh i et al. 
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Figure 1: Graphical models: (a) message passing in a portion of a belief network; (b) a 
simple graph; and (c) Bethe tree (all nodes and edges) and Self- Avoiding Walk 
tree (black solid only) of (b). 
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where T t is the set of neighbors of node t. The belief, or pseudo-marginal probability of x^, 
on node t at iteration i, is: 



B\{x t ) oc ifjt(xt) Y[ m ut(xt)- 



(3) 



«er t 



A stable fixed point has been reached if m\ s {x 



m 



i+l 
ts 



(x s ), Vs € V. The pairwise belief 



m ps{x s )- 



(4) 



of random variables x s ,xt at iteration i is defined as: 

B l ts (x t ,x s ) oc ipt s (xt,x s )ipt(x t )ip s (x s ) ] J m l ut (x t ) ] ( 

u£F t \s p& s \t 

The computation tree first introduced in IWiberd (|l996h is always applied i n the analysi s 
of LBP. Bethe tree and SAW tree are two types of computation trees used in Ihler ( 20071 ). 
which will also be used in the rest of the paper. Both Bethe tree and SAW tree are 
tree-structured unwrappings of a graph G from some node v. The Bethe tree, denoted as 
Tg(G, v, n), contains all paths of length n from v that do not backtrack, while the SAW tree, 
denoted as Tsaw(^>-, v, n ), contains all paths of length n < |V| + 1 that do not backtrack 
and have all nodes on the path unique. The belief on node v at iteration n in synchronous 
LBP is equivalent to the exact marginal of the root v in the n-level Bethe tree. 

Figure [D^c) illustrates the Bethe tree and the SAW tree for the graphical model in 
Figure [2(b). For synchronous BP, each iteration of Equations ((2]), (|3]) and (|4|) corresponds 
to a level in the Bethe tree. 



2.2 Approaches to Analyze Convergence of LBP 

Various approaches have been presented to d erive convergence cond i tions for the sum- 
product algorithm, including Gibbs measure (Tatikonda and Jordanl ^200j )}. equivalent 
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minim a x problem Heskes ( 20041) j . and contraction property of L BP updates Ihler et al 



(|2005l ): iMooii and Kappenl (f2007l )]. iTatikonda and Jordan! (|2002l ) proved that, when the 



Gibbs meas ure on the corr esponding computation tree is unique, LBP converges to a unique 
fixed point. iHeskesI hoo4 ) proved t hat, when the min ima of Bethe free energy is u nique, 
there is a unique fixed point for LBP. Ihler et al.l (|2005l ) and lMooii and Kappenl (|2007l ) used 
similar methodology by applying measure on potential functions. They proved that when 
LBP updating is a contractive mapping, LBP will converge . The y both c ompa red their con- 
vergence results with those of Tatiko nda and Jord an ( 2002J) and IHeskesI (|2004l ) , and showed 
that their results are stro nger. IMooii and Kappenl (120071 ) further showed that t hey derived 
more gener al results than Ihler et al] ( 20051 ). Enlightened by the discu ssion in Ihler et al.1 
( 20051 ) and Mooij and Kappen ( 20071 ). and based on the framework of Ihler et al. ( 2005 ). 
we use a new measure on message errors of LBP, in order to obtain distance bound and 
accuracy bound. 

Our contributions are as follows: 

1. We present a tight upper- and lower- bound for multiplicative message error e(x) in 
Section 12.51 Furthermore, based on the upper- and lower- bound, we derive tight uniform 
distance bound and non-uniform distance bound for beliefs B(x) in Section O which help 
to tighten the accuracy bounds between beliefs and true marginals in Section [5] and correct 
the upper-bound on message residuals for residual scheduling in Section [6| 

2. We investigate the relation between convergence of LBP with sparsity and walk- 
summability of graphical models in Section [U We extend walk-summability for Gaussian 
graphical models to general graphical models and compare the tightness of existing conver- 
gence conditions. 

3. We analyze the paramagnetic fixed point and two other fixed points for uniform 
binary graphs using message updating functions, and present true message error variation 
functions to show dynamics of sum-product algorithm in Section [7| 



2.3 Message-Error Measures 

Define message error as a multiplicative function e\ s {x s ) that perturbs the fixed-point mes- 
sage mt s (x s ). The perturbed message at iteration i is hence 

m\ s {x s ) = m ts (x s )e\ s (x s ). 

Dealing with normalized messages, we define fixed-point incoming message products as 

M ts (x t ) oc ipt(xt) Yl m ut(x t ), 
uer t \s 

and perturbed incoming message products as 

Ml(x t ) ocVt(^) J| mi t (x t ), 
uer t \s 

and incoming error products as 

Ei{xt)= JJ ei t (x t ). 

u& t \s 
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We have 

Mi(x t ) * M ts (x t )El(x t ). 
Thus, the outgoing message error from node t to node s at iteration i + 1 is: 

i+1 _ m^jxs) _ J ipts{xt,x s )M ts (xt)E l ts {x t )dxt J iptsjxt, x s )M ts (x t )dx t dx s 



m ts {x s ) f ipts(xt,x s )M ts (xt)El s (x t )dx t dx s f ipts(x t , x s )M ts (xt)dx t 
In the following, we will introduce two measures on message errors. 

2.3.1 Dynamic-Range Measure 

The dynamic-range measure of error introduced by Ihler et al. ( 20051 ) is defined as: 



d(ek)=maxi/-££f. (5) 
We have d(e\ s ) -> 1 when e\ s {x) -> 1. In llhler et alJ (|2005l ) [Th.8] it was shown that when 



— maXa^cci y 5/! ts (c'd) ^ finite, the dynamic-range measure satisfies the following 
contraction: 

i+1 d^ ts fd{E\ s ) + 1 
d(ip ts y + d{E l ts ) 

in other words, based on the dynamic-range measure, the outgoing message error is bounded 
by a non- linear function of the potential function and the incoming error product. 

2.3.2 Maximum-Error Measure 

To study the dynamics of message error propagation, dealing directly with errors is more 
interesting than dealing with dynamic range. Moreover, we target to tighten distance 
bounds of LBP results by using a new error measure. We thus introduce the following 
maximum multiplicative error function as an error measure: 

i+1, \ f ipts(x t ,x s )M ts (x t )El s (xt)dx t J ' ^u{xt)M ts (xt)dx t ._. 
maxe,7 (x s ) = max^ — — - — — — x — ; — , (7) 

x b x 8 J ip tir (xt)M ts (xt)E l ts (xt)dxt J yts{xt,x s )Mts{x t )dx t 

where ipt*( x t) = J ipts(xt,x s )dx s . It is immediate that the maximum-error measure ap- 
proaches one when multiplicative errors vanish. We will show later that this error measure 
satisfies the following contraction: 

m ax e ;r(,,)<f d(fcWfcWE 'V + ') 2 . (8) 
<•'••>- \ d(,k,)d(,p„) + d(El,) J 1 ' 

Dynamic-range measure and maximum-error measure are equivalent when the maximum 
and minimum of an error function are reciprocal. By comparison, maximum-error measure 
gives an absolute error, while dynamic-range measure gives a relative error which is invariant 
to scaling. We will show in the following of the paper that maximum-error measure should 
be used, when we are interested in absolute errors. Furthermore, both defined in dynamic- 
range measure, d(ipts) and d(ipu) correspond to two types of matrix norms on ip ts - d(ipu) in 
the RHS of Inequality ([8]) characterizes the effect of normalization factor on max Xs e l t ^ 1 (x s ). 
We will discuss the influence of d{ipt*) on error bounds in Section [2.51 
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2.4 Strength of Potential Functions 

Hesked (|2004l ). Ilhler et al.1 (|2005h and iMooii and Kappenl (|2007l ) have denned measures of 
strength of potential functions respectively, which help to obtain better convergence condi- 
tions than those only related with topology of graphical models. In the following, we will 
show the relationship between beliefs and strength of pairwise potential functions. 



2.4.1 Strength of Potential functions in IheskesI ( 20041 ) 

Hesked (|2004h defined at yS as the strength of a pairwise potential function ijj ts (xt,x s ) meeting 
the following equation: 



1 



max 



ll) ts {x t .,X s )i>ts{xt,X s 



1 - <J tyS xt,x s ,xt,x s 1pts(xt,X s )ifj t s(xt,X s ) 

This strength is related with the correlation of LBP marginals as follows: 



B ts (x t ,x s 



< 



1 



B t (x t )B s (x s ) l-cx M ' 

which was then utilized to give a better convergence condition than the one only depending 
on graph topology. 



2.4.2 Strength of Potential functions in IIhler et al.I (120051 ) 

Ihler et al.1 (j2005h proposed the dynamic-range measure d(ipts) as the strength of potential 
functions ^t s (xt, x s ). Let us restate the definition of the strength of potential functions and 
its relationship with message errors in Section [2.3.11 as follows: 



ts) 



max 



Xt^Xs y Xt,Xs 



lpta(xt,Xs) 
4>ts{xt,X 3 ) ' 



d(e ts 



< d(^ ta ) 2 d(E ta )+l 



By considering single node potentials ipt( x t) and t/) s (x s ), Ihler et al. ( 20051 ) weakened the 
strength of pairwise potential functions by using the following dynamic range measure: 



d(^) 2 = mind(^) 2 



sup 

t j % s 1 3? t j % s 



llpts(x t ,X s )lpts(x t ,X s 



l/jt s (xt,X s )lpts{x t ,X s )' 

We will apply the strength of potential functions in Equation [9] in our following results 
2.4.3 Strength of Potential functions in Imooij and KappenI ( 20071 ) 



(9) 



Mooii and Kappen ( 20071 ) mentioned a measure of the strength of potential function ifjt s (xt, x s ), 
which is defined as: 



ts) 



max 

X t ^X t ,X a ^X a 



4>ts{xt,Xs)i>ts(xt,X s ) 
ll>ts(xt,Xs)4>ts(xt,Xs) 



1 - <J t , S 



1pts{xt,X a )lpts{x t ,X a ) j 1 -|- y/1 — O t g 



(10) 



i>t a (xt,X a )4>t a (xt,X a ) 
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They defined log dynamic range measure as metric of errors. Let Aj s be the log message 
reparameterization of message mt s . That is, 

A ts (>s) = \ogm ts (x s ). 

Denote AA as the difference of log messages. Thus, we have 

AA ts (x s ) = logm ts (x s ) - \ogm ts {x s ) = \oge ts {x s ). 



By the quotient norm and Equation (41) in lMooij and Kappenl (|2007l ). we have the following 
metric of error 

l|AA fs || = ~ sup \ A\ts{x s ) - AA ts (a/ s )| = logd(e ts ). (11) 



Using the quotient mapping approach of parallel LBP update in Mooij and Kappenl 



(2007), we will find the relationship between the strength of potential functions in Equa- 



tion (I10p and the metric of message errors in Equation (jllj) in the following. 



m 



Because ||AAJ| < YL r , JlfHlllAA»J and ||#^|| < N(ip t8 ) by Equation (36-45) 



Mooij and Kappenl (|2007l ). we have 

logd(e ts )<iV(^ s )£ uer t \s io & d ( e ut) < N(tp ts ) log d(E ts ), 
or, d(e ts ) < d(E ts ) N ^\ 

We can observe that the smaller N(ipts) is> the smaller is d(et s )', therefore, the faster is the 
contraction of errors. The previous inequality reveals another result on contractive property 
of message errors beside the one in Equation ([6]). 

In the following, we use the maximum-error measure in Equation ([7]) to explore upper 
and lower bounds on message errors, and upper bounds on the distances between beliefs. 

2.5 Upper- and Lower-Bounds on Message Errors 

We have the multiplicative error function as follows: 



j+i, , _ J iptsjxt, x s )M ts (x t )El s (xt)dx t J ip t *(xt)M ts (xt)dxt 
3 ' s f il>u{x t )M ts (x t )E$ s (x t )dx t X j ^ts{x u x s )M ts {x t )dx t ' 



where ipt*{xt) = / tpts(xt,x s )dx s - We will show that the error function is upper- and lower- 
bounded. 

Theorem 1 Multiplicative outgoing errors are bounded as: 

f d(ilHs)d(iM + d(E ts ) \ 2 . { d(ij ts )d(4> tir )d(E ts ) + 1 
-rr-. — r-rr- — , : < mm e*. x, < e/, ij < maxe+slXs) < — — — — — : 



. . < vainets{x s ) < et s (x s ) < maxe^(x s ) < . 

\d(^ts)d{ipt*)d{Ets) + 1 / *s x 3 \ d(ipts)d{ipt*) + d(Ets) 



The proof appears in Appendix A. 

Let us use the following denotation for our upper-bound: 



/ ' d{i> ts )d{i> t *)d(E t a ) + i 



2 



Ai = vyi6V ^ l *> y ISJ . (12) 

1 V dtyt-Wt*) + d{E ts ) J ^ ' 
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From (jlhler et al.l . l20Q5l . Th.2 and Th.8), we can derive their upper-bound for max Xg et s (x s 



maxe ts (x s ) < d(e ts ) 2 < 



( d^ ts ) 2 d{E ts ) + V 2 
V d(^ ts f + d{E ts ) 



A 2 . 



(13) 



Theorem 2 The upper bound Aj on the multip l icativ e error provided in Theorem [7] is 
tighter than the upper bound A2 from (Ihler et al. . 200A . Th.2 and Th.8): 

Proof Because Ai in (fT2"j) is increasing in d(^ t *) we conclude that (fT2"j) implies (fTHj) . i.e., 
Ai < A2, because 



Jf , , M>t*(a) fi) ts (a,x s )dx s 

dytpt-k) = max \ / — — — — = max / 1 



o.,b V ipt*(b) 0,6 V / ipts(b,x s )dx s 



/ ipts(a,c) hpts(a,c) 

< max \ / max — — = max \ / — - = d(ip ts ) 

a,b y c,d ip ts (b,d) a,b,c,d\J ip ts (b,d) 



We can see how d{ipt*) tightens the upper-bound by analyzing the log-distance between 
Ai and A2. Let d(ifit*) = Kd(tpts), where l/d(ipts) < K < 1. Therefore, the log-distance 
between Ai and A2 is denoted as 

, . , . , , , Kd^„fd(E„) + \ rf(»„) 2 + d(E t ,) 
= 1" 6 A. " log A 2 = 2 x l° g { Mfa)2 + rf(E<i) x d{ , hiMEls) + 1 }- 

We can easily find that the first gradient D^(K) > when d{Et s ) > 1. Thus, the maximum 
log-distance between Ai and A2 is obtained at K = l/d(ipts)- In other words, when 
d(ipt*) = 1) our upper-bound Ai is tighter than A 2 at farthest. 

3. Distance Bounds on Beliefs 

In the study of convergence, we are interested to know how beliefs will vary at each iteration, 
when LBP fails to converge. We will show that beliefs are bounded given the strength of 
potential functions and the structure of the graph. In the following, we will present our 
uniform distance bound and non-uniform distance bound on beliefs. Based on those bounds, 
we further present uniform convergence condition and non-uniform convergence condition 
for synchronous LBP. 

3.1 Uniform Distance Bound 

Corollary 3 (Uniform Distance Bound) 

The log- distance bound of fixed points on belief at node s is 

^ d{ip ts )d{ipu)e + 1 2 
^ ° g[ d^ ts )d(^) + e ) ' 
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where e should satisfy 

r d(ip ts )d(ip tic )e + 1, 



logs = max > log( 



MeK f; d{ip ts )d{ip u ) + e 
ter s \p 



The proof appears in Appendix A. 

Let us reintroduce the error bound-variation function used in the proof for Corollary [3j 



Ooge) 



log [J ( 

teT s \ P 



d(ijj t s)d(ilJt*)e + 1 , 
d(ip ts )d(ip t *) + e ' 



log£,£ > 1. 



(14) 



Adopting the upper-bound A2 in (|13[) . the error bound- variation function is: 

+ e 



^(io g£) =iog n ( ltV^ 1 )2 - log£ ' £ ^ L 



(15) 



Those error bound- variation functions describe the upper-bound on variation of maximal 
message errors throughout the belief networks. We can see that G~,(loge) < G^ p (loge). In 
other wor ds, the error boun d- variation function using our upper-bound Ai is tighter than 
that using llhler et al.1 (120051 ) 's upper-bound A2, which is illustrated in Fig. [2j However, in 
Ihler et al.l (|2005l ). they used the following error bound- variation function: 



log n ( 

ter s \ P 



d^ ts ) 2 e' + 1 



dty; 



+ e 



T ) -logs', 



(16) 



where ef is an upper-bound on dynamic range measure d(Et s ). Since our e is an upper- 
bound on maximum error measure max.E ts , it's hard to compare G® p (\oge) and G^loge'). 
In other words, we canno t say our Uniform Distance Bound in Corollary [3] is better than 
that in (jlhler et all 120051 . Theorem 13). 

When the error bound-variation function is always less than zero, the maximum of error 
bounds decreases after each iteration of LBP. In other words, LBP will converge. Therefore, 
our uniform distance bound in Corollary [3] will lead to a sufficient condition for convergence 
of LBP. 

Theorem 4 (Uniform Convergence Condition) 

Based on maximum- error measure, the sufficient condition for the sum-product algorithm 
to converge to a unique fixed point is 



Ed(ipts)d(ip t *) - 1 1 
-77 r-77 ; < -• 
^,e^ cr d(iks)d + 1 2 

ISI s\P 



The proof appears in Appendix A. 

Since we cannot compare Gf p (loge) and Ggp(loge') directly because e and e' correspond 
to different measures, let us take the maximum of the two measures and deal with it as 
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Figure 2: Error bound- variation functions versus true error- variation function for the lo- 
cal graph of node s. Potential functions on edges (ti, s), (tz, s), (£3, s) are the 
same, where r/ = 0.7. We also impose the same incoming error product Et s 
on nodes t\,t2,t^. The dotted curves depict the true error variation functions, 
{log max x E sp (x) — log max x E ts (x), t £ T s \p}, which are enveloped by our error 
bound- variation function (loge). 




Figure 3: Four simple graphical models: (a) a four-node fully connected graph; (b) a partial 
graph that has one less edge than (a); (c) a nine- node graph with uniform degree; 
and (d) a 3 x 3 grid that is a partial graph of (c). 



a new measure. Specifically, let e = max{e,e'}. After some calculation, we can find that 
G^(loge) is greater than G^(loge). In other words, G^(loge) is tighter than G^(loge). 
Therefore, the convergence condition derived from Gfp(loge) will be better. The following 
lemma provides a proof for this observation. 
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Lemma 5 Our sufficient condition X^er s \p J^t 
condition in Ihler et al. which is 



d(ip ts )d(ip t *)-l 
)«JW**)+1 
d(i>ts) 2 -l 



< 



is worse than the sufficient 



< 1. 



Proof 2( 



d,(il>t s )d(iPt+)-l - 
d{^ ta )d{^ u )+l. 



> 



Our failure to improve the uniform convergence condition by using maximum- error 
measure shows that dynamic-range measure is better than maximum-error measure with 
respect to the sensitivity of the measure to convergence. Nevertheless, as for the upper 
bound on a multiplicative message error et s (x), maximum-error measure gives a tighter 
result, which is shown in Theorem[2l Furthermore, the maximum-error measure may provide 
better distance bounds for beliefs. 

Inspired by the sensitivity of dynamic-range measure to convergence, we present the 
following improved uniform distance bound, which first calculates the fixed-point values of 
error bounds in dynamic-range measure, and then computes the error bounds among beliefs 
in maximum-error measure. 



Corollary 6 (Improved Uniform Distance Bound) 

The log- distance bound of fixed points on belief at node s is 

t ^ ° g[ d^ ts )d^ u ) + e ) ' 

where e should satisfy 

^ d{j> ts ) 2 e + 1 
loge= max > log—— — . 

ter s \p 



Proof Using the approach in (jlhler et alj . l2005l . Theorem 12) to obtain distance bounds on 



incoming error products in dynamic-range measure and applying our Theorem [lj we obtain 
our corollary. ■ 



Let see how our uniform distance bound and improved uniform distance bou nd perform 



for graphical models in Fig. [3]by comparison to the Fixed-point distance bound in llhler et al 



( 20051 ). Let all the pairwise potential functions be f _ ^ ^ ^ | where n > 0.5 and all 

\l-7] 1] J 

the single node potentials be . Therefore, dfyts) = \A?/(1 — v) an d d(ipt*) = 1 for 
V(i,s)eE. 

We compare the following bounds in our simulations: UDB, our uniform distance bound 
in Corollary [3l Improved-UDB, our i mproved uniform distance bound in Corollary [6j Ihler- 
UDB, Fixed-point distance bound in (jlhler et al.l . l2005l . Theorem 13). FigH- Fig.[7]illustrate 
the performances of those bounds for graphs in Figs. [3(a), (c), (b) and (d), respectively. 

Graphs in Figs.[3ja) and (c) are uniform (uniform degrees, uniform potential functions). 
Given a specific rj, all nodes have the same distance bound. The critical value of r\ is the 
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Figure 4: True distance, uniform distance bounds and non-uniform distance bounds for the 
graph in [3]^a) with various r/'s. The empirical critical value of rj for LBP to 
converge is rj < 0.75. 



value beyond which LBP will not converge. For those two graphs, the empirical critical 
values of rj with respect to the convergence of LBP are 0.75 and 0.67 respectively. We 
can see that, for various 77's, our Improved-UDBs are very close to the true errors between 
beliefs. Our UDBs become tighter when rj increases, while Ihler-UDBs become looser. From 
Fig. H] and Fig. El we can see that, compared to Ihler-UDB, our UDB requires stricter critical 
values of rj to ensure error bounds to be zeros. Specifically, for Fig. U when rj = 0.745, 
our UDBs are non-zeros and Ihler-UDBs are zeros; hence, our UDB requires rj < 0.745 
for the convergence of LBP, while Ihler-UDB only requires rj < 0.75. Nevertheless, the 
critical values by our UDB are 0.735 for Fig. [3|a) and 0.66 for Fig. [3^c), which are close to 
the empirical critical values. Based on our UDB and Ihler-UDB, our Improved-UDBs will 
approximate zeros when rj approaches 0.75 and give tightest distance bounds for any rj. 

3.2 Non-Uniform Distance Bound 

Fig. E^b) and Fig. [3] (d) are non-uniform graphs. Because uniform distance bounds are 
computed locally, beliefs on the nodes with different topologies will have different error 
bounds, which can be observed from Fig. [6] and Fig. [7J We can also find that when the 
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Figure 5: True distance, uniform distance bounds and non-uniform distance bounds for the 
graph in 0(c) with various 77's. The empirical critical value of rj for LBP to 
converge is 77 < 0.67. 



true errors are zeros, uniform bounds are not all zeros. In other words, rj must be smaller 
than the empirical critical value to ensure the largest uniform distance bounds to be zero. 
Furthermore, in such cases, uniform convergence conditions derived from uniform distance 
bounds will not perform well as for uniform graphs. Therefore, when every loop contains 
potentials with various strengths and each node has different topology, we present the 
following non-uniform distance bound and improved non-uniform distance bound. 

Corollary 7 (Non-uniform Distance Bound) 

The non-uniform log-distance bound of fixed points on belief at node s after n > 1 iterations 
is 



where e\ s is updated by 



, d{iput)d{il)u*)e l ut +1 , 2 
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Figure 6: True distance, uniform distance bounds and non-uniform distance bounds for the 
graph in El^b) with various 77's. The empirical critical value of n for LBP to 
converge is r) < 0.83. 



with initial condition 

ver u \t 

Proof The result can be easily proved from Corollary [3l by defining the error bound- 
variation function in (fH|) as follows: 

G ts (iog £ L) = iog n A n ,(4 7 1 )-iog4= £ M ^^rr?^ ) 2 - lo g^- 



u£F t \s u€T t \s 



d(i/i ut )d(ip u *) + e l ut 



Similarly, based on the fact that the dynamic-range measure gives better convergence con- 
dition than the maximum-error measure, we improve the previous non-uniform distance 
bound in the following. 

Corollary 8 (Improved Non-uniform Distance Bound) 

The improved non-uniform log-distance bound of fixed points on belief at node s after n > 1 
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Figure 7: True distance, uniform distance bounds and non-uniform distance bounds for the 
graph in El^d) with various 77's. The empirical critical value of r/ for LBP to 
converge is 77 < 0.79. 



iterations is 



where e\ s is updated by 



ter s 



log e 



«er t \s 



+ £ 



i-1 



u/itt initial condition loge^ = Y^ v er u \t d{ipvu) 2 - 

Proof Using the approach in ( Ihler et all 2005 . Theorem 14) to obtain distance bounds on 
incoming error products in dynamic-range measure and applying our Theorem [TJ we obtain 
our corollary. ■ 



Let see the performaces of our non-uniform distance bound and improved non-uniform 
dis tance bound for th e graphs in Fig. [3] compared with the non-uniform distance bound 



in (jlhler et al.l . 120051 . Thm. 14). We denote the bounds in our simulation as follows: 
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NUDB, our non-uniform distance bound in Corollary [7J Improved-NUDB, our improved 
no n-uniform distanc e bound in Corollary [HJ Ihler-NUDB, non-uniform distance bound 



in (jlhler et al.l . 120051 . Theorem 14). 

For uniform graphs in Fig. El^a) and (c), NUDB performs exactly the same as UDB. 
However, for non-uniform graphs in Fig. E|h) and (d), because NUDB propagates error 
bounds throughout the whole graph rather than on a local neighborhood, NUDBs are 
tighter than UDBs, which can be observed from Fig. [6] and Fig. [71 For various rj's, our 
Improved-NUDBs always approach the true errors. Therefore, when our Improved-NUDB 
is zero, rj almost equals the empirical critical value to ensure convergence of LBP. Though 
worse than Improved-NUDB, our NUDB performs better than Ihler-NUDB when r] is far 
way from the area of convergence. 

3.2.1 Non-Uniform Convergence 

Based on our Improved-NUDB or Ihler-NUDB, a sufficient convergence condition of LBP 
can be derived, which is based on the dynamic-range measure of propagating errors. 

For each cycle-involved vertex v, T(G, v) is the corresponding computation tree. Let V 
be the set of vertices in the computation tree. For u/j G V,i = 0, |V| — 1, l{wi) is the 
labelling function which maps Wi to the original vertex in G. Let 1(wq) = v. 

Theorem 9 (Non-Uniform Convergence Condition) 

For a graphical model G(V,E), {T(G, v),v E V} is the set of computation trees. Let E 
denote the set of directed edges. For each T(G,v),v G V, given vu S E, rl vu denotes an 
expression on edge vu: 

(17) 

where T Wi is the set of neighbors of Wi. The non-uniform sufficient condition for the sum- 
product algorithm to converge to a local stable fixed point is: 

maxHvu < 1. 

vu£E 



The proof appears in Appendix A. Based on the type of computation tree, the non- 
uniform convergence condition will be called non-uniform convergence condition based on 
N-th level Bethe tree, or non-uniform convergence condition based on infinite Bethe tree, 
or non-uniform convergence condition based on SAW tree. Our non-un iform convergence 
condition based on infinite Bethe tree is equivalent to ( Ihler et all 2005 . Theorem 14). 

When a graph has uniform potential functions with strength d{ip), to ensure convergence, 
it is sufficient to have 

max V d{ ^ ? ~ 1 V ^ ~ 1 V WZ^1<1 (18) 

vuim ^ w+i 4^ £^ w+i ' [ ' 

wi£r v \u wjer Wi \v w r <=r Wq \w p 

Let us apply our non-uniform convergence condition based on SAW tree to the graphs 
in Fig. [3{b) and (d) with uniform potential functions as in the previous simulations. For 
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the graph in Fig. [3^b), we obtain the critical value rj < 0.78 for convergence of LBP, which 
is closer to the empirical value rj < 0.83, compared to r/ < 0.75 obtained by uniform 
convergence condition. For the graph in Fig. E[d), we obtain the critical value rj < 0.77, 
while the empirical value is n < 0.79 and the critical value obtained by uniform convergence 
condition is n < 0.67. Therefore, our non-uniform convergence condition is tighter than our 
uniform conv ergence condition . However, since our non-uniform convergence condition is 



derived from (jlhler et alj . l2005l . Theorem 14) , we do not i mpro ve the convergence condition. 



Rather than in the form of distance bound in (jlhler et al.l . l2005l . Theorem 14), we express the 
convergence condition explicitly, which will be used in our later analysis of walk-summability 
of graphical models. Furthermore, we improve distance bounds between beliefs in Corollary 
E] and Corollary El which are useful in tightening accuracy bounds in Section \E\ 



4. Convergence of Loopy Belief Propagation 
4.1 Sparsity and Convergence 

To compute our non-uniform convergence condition in Theorem [9] is not easy, when the 
graph is not sparse or not symmetric. Nevertheless, our Theorem [9] can be used to deduce 
convergence properties of sparse graphs. 

It lacks theoretical verification that the more sparse a graph is, the less stricter is its 
convergence condition. However, the definition of sparse graphs is vague; therefore, to be 
confined, we would relate sparsity with partial graphs. Let us define partial graphs and 
introduce the convergence property of such graphs in the following. 

Definition 10 (Walk) 

In a graph G(V,E), a walk of length I is a sequence of nodes w = (vq,vx, v{), Vi € V, 
such that each step of walk (vi,Vi + 1) corresponds to an edge in E. 

Definition 11 (Prime Cycle) 

A closed walk is called a prime cycle if it is not backtracking and not a repeated concatenation 
of a shorter closed walk. 

Definition 12 (Reduction) 

A walk composed of two edges (t>i,t>2) and (v2,v 3 ) can be reduced to a walk composed of one 
edge (vi,v 3 ), where ip VlV3 (x Vl ,x Va ) = J^ipv^ixvi^v^ipv^ix^^^dx^, when there is 
no branch on the walk. 



Definition 13 (Extension) 

A walk composed of one edge (vi,v 3 ) can be extended to a walk composed of two edges (t>i,t>2) 
and(v 2 ,v 3 ), where j x ip Vl V2 (x Vl , X V2 )lp V2V3 {x V2 , X V3 )dx V2 — 1pviv 3 {x-vi 1 x V3 ) . 

Definition 14 (Partial Graphs) 

For two graphical models Gi(Vi,Ei) and G2(V2,1E2) after reduction and extension, there 
exists an isomorphism between graphs Gi(Vi,Ei) and G2(V2, E2), whenV^ — ^2 and E2 C 
E2. When E2 — Eg is cycle-involved, we call Gi a partial graph of G2 and denote it as 
Gi c G 2 . 
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Theorem 15 (Strictness of Convergence Condition for Two Partial Graphs) 

Given G\ and G2 as defined in Definitional^ assume that Gi C G2. Assume the dynamic- 
range measures of potential functions for edges in Ei are not greater than those of potential 
functions for corresponding edges in E^ . Then, when LBP for G2 (V2 , E2 ) converges, LBP 
for Gi(Vi,Ei) must converge; however, the reverse implication is not true in general. 

Proof Because Gi C G2 and E2 — E^ are cycle-involved, Ts(Gi,v,n) C Ts(G2,v,n). 
Therefore, the expression in (|1T|) for G2 has more summands than that for Gi. When G2 
satisfies the convergence condition in Theorem [9] Gi must satisfy it. However, when Gi 
satisfies the convergence condition, G2 may not satisfy it. ■ 

When the potential functions of a graph are uniform, we have the following corollary. 
Corollary 16 (Critical Values of Convergence for Two Partial Graphs) 

(77 ' 1 — 77 ' \ 

* 1 ) j * = 1) 2 

1 — Vi Vi J 

on all edges. Then, the critical values for convergence of LBP satisfy r]2 < Vi- 

Proof Because (118p for G2 has more summands than that for Gi, we easily have d{ip2) < 
d(ipi) to satisfy the inequality. Because d(ipi) = \Jr\iji\ — 7/j), we get 772 < ■ 

Our Theorem [15] and Corollary [16] can be easily extended to strictness of convergence 
condition of LBP for a set of partial graphs, and for those with uniform potential functions. 



Corollary 17 (Strictness of Convergence Condition for Set of Partial Graphs) 

Given Gi C G2... C Gat, assuming the dynamic-range measures of potential functions 
on isomorphous edges of those graphs are correspondingly non- decreasing in the previous 
partial order, LBP convergence for Gj implies LBP convergence for Gi, where i < j and 
i,j = 1,...,N. However, the reverse implication is not true in general. 

Proof For any Gj C Gj in the set of {Gj, 1 < i < N}, according to Theorem 115} we have 
the convergence of Gj implies the convergence of Gj. ■ 



Corollary 18 (Critical Value of Convergence for Set of Partial Graphs) 

(77 ' 1 — 77 ■ \ 

1 * 1 I ' 1 — 

1 — f]i Vi J 

i < k on all edges. Then, the critical values for convergence of LBP satisfy r]k < rjk-i--- < f]\- 

Proof For any Gj C Gj in the set of {Gj, 1 < i < N}, according to Corollary 1161 we have 
the convergence of rjj < r/j. ■ 

By our Corollary [T7] on partially ordered graphs, we can conclude that graphs with less 
cycle- induced edges are more sparse and thus have weaker convergence condition. It is 
intuitively true that the strength of potential functions for Fig. [3|a) or Fig. [3jc) should be 
weaker than that for Fig. E^b) or Fig. E^d) to ensure convergence of LBP. This observation 
can be soundly verified by our previous corollaries. 
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(a) 



Figure 8: Diagram summarizing mildness of convergence conditions. The SAW tree is a 
partial tree of the iV-level Bethe tree, therefore, convergence condition based on 
the SAW tree is stronger. 



4.2 Walk-Summability and Convergence 

Malioutov et~ai] (|2006h related the convergence of LBP with the spectral radius of partial 
correlation matrix of Gaussian graphical model, for which they introduced a concept called 
walk-summability. We observe similarity between walk-summability of Gaussian graphical 
model and our convergence condition for general graphcial model discussed in Section ^. 2. li 
Therefore, based on some existi ng works in literature, we extend the walk-summability 
defined in iMalioutov etai] (feood ) to that for general graphical models. 

A Gaussian graphical model is defined by an undirected graph G(V,E), where V is 
the set of nodes and E is the set of edges, and a set of jointly Gaussian random variables 
{xi,i G V}. The joint density function is defined as follows: 



p(X) oc exp{--x T Jx + h T x}, 

where J is a symmetric and positive definite matrix called information matrix and h is a 
potential vector. The partial correlation coefficient between random variable Xi and Xj is 
defined as follows: 



Ji 



var {pa | xy \ ij ) var (xj \xy\ij ) \J Jn^ii 



A walk is defined in Definition [TUJ The weight <j){w) of a walk w = (vq,Vi, ...,Vi/ w \) with 
length l(w) is defined as: 



l(w) 



4>{w) = Yl 



(19) 



k=i 



Definition 19 iMalioutov et all \20Q&) J(Walk-Summable) 

A Gaussian distribution is walk-summable if for all i,j £ V the unordered walk uu from i to 
3, £«,:»_>,• <P( W ) , is wel1 defined. 
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Proposition 20 iMalioutov et al. hood ) J(Walk-Summability) 



Let R be a partial correlation coefficient matrix of a Gaussian graphical model, of which diag- 
onal entries are zeros. Each of the following conditions are equivalent to walk-summability: 

(i) J2w.i->j \<l>( w )\ converges for all i,j G V, 

(ii) YliR 1 converges, where R\j = \Rij\ and I is the length of walk, 
(Hi) p(R) < 1, where p{R) is the spectral radius of R, 

(iv) I-RyO. 

The walk-summability of a Gaussian g raphical model has bee n shown to be related with 
the convergence of LBP. Proposition 21 in IMalioutov etail (120061 ) states that "If a model on 



a (Gaussian) graph G is walk-summable, then LBP is well-posed, the means converge to the 
true means and the LBP variances converge to walk-sums over the backtracking self-return 
walks at each node" . Enlightened by the analysis for Gaussian graphical model, we extend 
the walk-summability perspective to general graphical models in the following. 

For a Gaussian graphical model, the interaction between two random variables is the 
partial correlation coefficient. However, for a general graphical model, we have multi- 
dimensional potential functions between two random variables. We hope to find a scalar 
qu antity to represent the inte r action between them as well. 

Watanabe and Fukumizu ( 20091 ) introduced weights on edges of an arbitrary binary 



graph, defined an edge zeta function based on those weights and related the convexity of 
Bethe free energy with the edge zeta function. Specifically, given V be the set of prime 
cycles \vk, } Vk 1 ---Vk; 1 V k^--Vk l Vk v \ defined in Defi n ition [TT1 for given weights u, the edge zeta 
function is defined in lWatanabe and Fukumizu! ( 2009 ) by 



Watanabe and Fukumizu! (|2009l ) by 



Cg(u) := IJ(l-sH) l ,g(w) :=u VkoVki ...u Vk ._ iVki ...u VhVkQ . 

We find that (1 — g(w))~ 1 = Yli^o(.9( w )) 1 > w bich represents the walk sums of a prime cycle 
and its repeated concatenations. 

They introduced an adjacency matrix of directed edges, which is defined as follows: 



l, tfpeTi\j, 

0, otherwise. 



Here we use i — > j rather than ij to explicitly represent directed edge. They showed that 

Cg(u)- 1 = det(I-UM), (20) 

where U is a diagonal matrix defined by Ui^j )P ^ q = tij_ > j5i_ > j iP _ i . g . 

Let us define two directed edges i — > j and p — > q satisfying p £ Ti\j as adjacent edges, 
and call UAA an interaction coefficient matrix for adjacent edges. Therefore, Equation 
(|20p relates weighted prime cycles with interaction coefficient matrix. Comparatively, for 
Gaussian graphical model, J -1 = (I — R)^ 1 = ^2iZq R and (R l )ij = Yl i ) ■ ^( w ): which 
characterizes relationship between summation of weighted walks and partial correlation 
coefficient matrix. 

Unlike correlation coefficient between two nodes (random variables), interaction coeffi- 
cient is between two edges. We introduce a weight matrix U, and Ujj = u^j. Notice 
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that U is not symmetric. (U )„,■ = Y] i g{w) corresponds to weighted walks of length I 

from q to j, while {(JAM) l )i^j^ q corresponds to weighted walks of length / from q of edge 
p — > q to j of edge i — )■ j. They are actually related in terms of weighted walks as follows: 



Watanabe and Fukumizu ( 20091 ) further defined weights as follows: 

Xij ~ rmirij 



1 — m 2 - 



where mean mi = E^Xi] and correlation Xij = E^lxiXj]. Let Spec{UM) C C denote the 
spectra. They presented the following theorem. 



Theorem 21 (Theorem 4- Watanabe and Fukumizu (200d )) 



Given U,M,m,i and Xij, Spec(UM)C C\R>i=^ Hessian of Bethe free energy is positive 
definite at {rrii,Xij}- 

Since convexity of Bethe free energy implies the uniqueness of the fixed point, Spec(UM)C 

l>i is a corresponding walk-summability condition for a binary graph. 

A symmetrization of and Uj-^i was defined in lWatanabe and Fukumizul (l2009h by 



_ Xij ~ mmj _ Gav bi .[x i ,x j ] 

Pij •- 



{(1 - m?)(l - m))y/ 2 {Vax 6i [a; i ]Vax 6j [xj]} 1 / 2 ' 



j3ij is the correlation coefficient between Xi and Xj. They showed Spec(UAi) = Spec(BAi), 
where {B)i^j^ q = jiijb~i^j^ q - Therefore, similar to Gaussian graphical model, for an 
arbitrary binary graph, we can also use correlation coefficient fiij to characterize the inter- 
action between two random variables and analyze the c onvergence of LBP. 

We find another interaction coefficient matrix in iMooii and Kappenl (j2007h . They 



proved that for pairwise binary graphs, LBP converges to a unique fixed point, if the 
spectral radius of AM is strictly smaller than 1, where Aij := tanh\Jij\. AAi is also 
an interaction coefficient matrix between neighboring edges. We can see Spec{BM) C 
C\R>1 or Spec(AJA) C C\R>i a s a walk-summable condition for binary graphs. However, 
( Watanabe and Fukumizu . 20091 . Lemma 3) showed that: given fiij at any fixed point of 



LBP, \Pij\ < tanh\Jij\. In other words BM is tighter than AM.. 

In the non-uniform convergence condition in Theorem [9j for a iV-th level Bethe tree, 
we add up all the iV-th step walks from a root node, where the weight on edge (t, s) is the 

d{ip ts ) 2 -l j ,/ , n2 _ _ /ipts(x t ,x a )il>ts(xt,x a ) 



q uantit y Sftt and d ^t s ? = su,»,...,,..,..,, V £i££)KS£i - Similarly to the previous 

analysis, we interpret this quantity as an interaction coefficient. Let W be the interaction 
coefficient matrix with entry Wt s 5t StPq and Wf S = . We define the walk-summability 

of a general graphical model as follows: 

Definition 22 (Walk-summability of General Graphical Model) 

Given W, a general pairwise graphical model is walk-summable, when p(WM) < 1. 
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Like that for binary grap hs, the walk-summability of a general graph is also related 
with the convergence of LBP. (jMooij and Kappenl . 120071 . Theorem 4) present a convergence 
condition for general graphical model: L BP converges to a unique f ixed point, when spectral 



radius piyVM) < 1. When factors in iMooii and Kappenl (|2007l ) correspond to pairwise 
potential functions, d(ipts) 



Therefore, the convergence 



su Px t ,x s ,x t ,x s y ip ta (xt,x s )tpt s (x u x s )- 

condition is equivalent to the walk-summability of the graphical model with the interaction 
coefficient matrix WA4. 

Lem ma 23 Our non-uniform convergence condition in Theorem is better than Theorem 
4 in lMooii and Kappeii 1(2001 ), or walk-summable condition in Definition\22l 



Proo f Let A 

(Hoo3))- (A N 



WM. p(A) < 1 is equivalent to 
i-^k-^ is the summation of all the weighted walks from edge i 



Mooij and Kappen 



A^Hi < l,JV-> oo 

j to k — > I, 

including backtracking walks. However, the walk-sum in (|17|) for a N- level Bethe tree does 
not include backtracking walks; thus, it is smaller than \\A ||i- Therefore, our non-uniform 
convergence condition in Theorem is milder than p(A) < 1, or walk-summable condition, 
which is illustrated in Fig. Efa). ■ 



By "milder", we mean the set satisfying the sufficient conv ergence condition is bigger. 
Since our non-uniform convergence condition is d erived from (llhler et all 120051 . Theorem 
14) an d they are equivalent for in finite Bethe tree, (jlhler et all 120051 . Theorem 14) is better 
than (jMooii and Kappenl 120071 . Theorem 4). When the convergence condition based on 
iV-level Bethe tree is satisfied, the convergence condition based on infinite Bethe tree must 
be satisfied, because the error bounds are guaranteed to decrease after N iterations of error 
propagation. Similarly, convergence condition based on iV-level Bethe tree is milder than 
that based on SAW tree. Therefore, we obtain mildness of convergence conditions , which 
is shown in Fig. E^b). 

In the following, we will analyze the performance of LBP with respect to accuracy and 
convergence rate. 



5. Accuracy Bounds for Loopy Belief Propagation 

Recently, llhlerl (l2007h presented an accuracy bound for LBP which relates the belief of 
a random variable to its true marginal. He showed that there exists a configuration on 
some nodes of the SAW tree rooted at certain node s of the original graph, such that the 
true maginal at node s of the original graph is equal to the belief at root s of the SAW 
tree. Therefore, given certain ex ternal force funct ions on a subset of nodes, he adopted the 
non-uniform distance bound in (jlhler et all 120051 . Thm. 14) to obtain an accuracy bound 
between beliefs and true marginals. 

Given d(p(x)/b(x)) < S, his accuracy bound is as follows: 



b(x) 



5 2 + (1 - 5 2 )b(x] 



< p(x) < 



5 2 b{x) 



1 - (1 - 5 2 )b{x) 



(21) 



where 5 is an error bound in dynamic-range meas ure, p(x) is the normalized true marginal 
and b(x) is the normalized belief. Note that 5 in ( Ihlerl . 120071 . Lemma 5) should be 5 2 . 
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Because our improved non-uniform distance bound has been shown tighter than his 
non-uniform bound, we can improve his accuracy bound between the belief and the true 
marginal. Let maxj. \logp(x)/b(x)\ < loge, where e is an error bound in maximum-error 
measure applying our Corollary [71 under certain external force functions on a subset of 
nodes of a SAW tree. Therefore, we have the accuracy bound as b(x)/e < p{x) < eb(x), 
where e < 5 2 . Combining our accuracy bound with the bound in (|2ip , we have the improved 
bound 

ri . . . b(x) . , , , , . 5 2 b(x) 

max{6(x)/£ ' S* + (l-6*)b(x) } ~ P{X) ~ mm ^)' i-(i-5i)b(x) } - 

6. Rate of Convergence and Residual Scheduling 

For an iterative algorithm such as LBP, the rate of convergence is an important criteria 
of performance. We will analyze the convergence rate of LBP by looking into the gradient 
of error bounds on messages. The error bound- variation function G sp (\oge) in (|14p is a 
measure of the variation of error bounds between successive iterations; on the other hand, 
it reflects how fast LBP converges, because the smaller G sp (loge) is, the faster error bounds 
tighten. Because dynamic-range measure is better than maximum-error measure in terms 
of convergence of LBP, we will use the following error bound- variation function: 

G, P (io ge )=iog n y?^ 1 - 1 ^ 

where e is an error bound in dynamic-range measure on incoming error product. We will 
use the first derivative of the function as a metric on the rate of convergence: 

epilogs) = T £( Wfs)4 ~ 1} i 

Recall that Crip (log e) should be less than zero to ensure convergence. When we have 
infinitesimal error disturbance, |C?ip (0)| will be used as a local rate of convergence. Because 
our rate of convergence varies on each direction of message passing, messages on the direction 
with the greatest rate will be updated prior to others in dynamic scheduling. 

Some work s have been done to utiliz e message residuals as a way of priority in dynamic 
scheduling by Elidan et al. ( 2006) and Sutton and Mccalluml ( 2007 ). Rather than calcu- 



lating future message residuals, ISutton and Mccalluml (120071 ) utilized their upper-bounds 



as estimates of message residuals in their scheduling algorithm RBP0L. They adopted 
maximum-error measure as a metric of message residuals, which was defined by them as 
r{mts) = max Is | \oget s {x s )\- They showed that by the contraction property of maximum- 
error measure it can be upper-bounded as r{m ts ) < J2 u er t \s r ( m ut)- However, their upper- 
bound is not theoretically sound, because they ignored the normalization factor in their 
proof. Therefore, we can modify their RBP0L by utilizing our upper-bound in ([8]). 

7. Fixed Points and Message Errors for Uniform Binary Graphs 

Mooij and Kappen ( 20051 ) analyzed the phase transition for binary graphs based on Hessian 



of Bethe free energy. They presented ferromagnetic interactions, antiferromagnetic inter- 
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actions and spin- glass interactions, b y analyzing stability of pa r amag netic fixed point and 



other stable or unstable fixed points. IWatanabe and Fukumizul (|2009l ) obtained several in 



teresting results on binary graphs based on edge zeta function and Bethe free energy. They 
stated that Bethe free energy is never convex for any connected graph with at least two 
linearly independent cycles. They also stated that the number of the fixed points of LBP 
is always odd for binary graphs. We will analyze the behavior of fixed points of LBP based 
on message updating function directly. 

In Section [3l we discussed uniform and non- uniform distance bounds on beliefs. An error 
bound-variation function was introduced to study the variation of error bounds between 
successive iterations. However, to study the mechanism behind message passing, we are 
more interested to know the variation of true errors. Since it is usually hard to formulate 
the true error-variation function for general graphical models, in this section, we will only 
explore true error variation functions for binary graphs. 

Let us first introduce a well-studied binary graph - Ising model. The probability measure 
of Ising model can be expressed as: 

i 

P(x) = — exp( ^ JstXsXt + ^OsXs), (22) 

(s,t)€E sev 



are ±l-valued, potential functions can also be expressed as ( 6X ^ S j\ ^ st ^ I and 



corresponding to i^ s t{ x s^ x t) = exp ( J s tx s x t ) and i^ s {x s ) = exp (9 s x s ) in ([I]). Because {x s } 

' exp (J a) exp (-J 
K exp(-J st ) exp(J s t) 

exp ' (^s) j _ However, rather than working on the Ising model, we will study a more sim- 
exp(— V s ) J 

pie model. We call it completely uniform model (uniform connectivity, uniform potential 

'a 

b a, 

functions ( ^\ , where a,b,c,d are positive. Similar to Q, we will put single-node poten- 
tial functions into beliefs and only discuss the influence of pairwise potential functions on 
message errors. We can easily find that a completely uniform graph has uniform messages. 



functions), which has the pairwise potential functions (? ) and single- node potential 



Property 1 For a completely uniform graphical model, when synchronous LBP reaches a 
steady state, all messages are the same. 

Proof Completely uniform graphs are topologically invariant for each node. In other 
words, each message has the same LBP update equation. If some messages are different, 
for the symmetric network, LBP will not reach a steady state. ■ 

Because all messages have the same LBP update equation, we can calculate the fixed-point 
messages exactly and discuss the distances between them. 

7.1 Fixed Points and Quasi-Fixed Points 

Let us first discuss fixed-point messages for completely uniform graphs. Assume the degree 
of each node is k. Let m ou t = ( .. ) denote the outgoing message and mi n = 
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denote each incoming message. Therefore, we have the following LBP updating function: 



y = F(x) 



ax k + 6(1 — x) k 
(a + b)(x k + (1 -x) k )' 



(23) 



We can easily find that (|23p is symmetric with respect to the point (x = 0.5, y = 0.5). 
Synchronous LBP update corresponds to the fixed-point iteration function x Tl+ \ = F(x n ), 
where n is the iteration number. When x n+ \ = x n , LBP message reaches a fixed point. 
However, we sometimes have x n +fc = x n or F k (x) = x, where F k (x) is the composition 
function of F(x) with itself k times, which shows fcth-order periodicity. We define the 
solutions to F k (x) = x,k > 1 as quasi-fixed points, when a belief network will oscillate. In 
the following, we will show that LBP for completely uniform binary graphs will have at 
most second order periodicity. 



Property 2 LBP updating function in (|23|) has at most three real fixed points. 
Proof The second derivative of F(x) is as follows: when a > b 



F^(x) = ((2x-k-l)x k +(2x+k-l)(l-x) 



\fc-2 



k(a-b)x k - 2 (l-x) 
(a + b)(x k + (1 -x) k ) 3 



> o,x e (0,0.5) 

< 0,x G (0.5,1) 
= 0,x = 0,1,0.5 



We can see that F(x) is strictly convex when < x < 0.5 and strictly concave when 
0.5 < x < 1. Similarly, for a < b, F(x) is strictly concave when < x < 0.5 and strictly 
convex when 0.5 < x < 1. When this function intersects with an arbitrary line, there must 
be at most three crossing points. As shown in Fig. Ufa), it must have at most three crossings 
with y = x; similarly with y = 1 — x in Fig. [9]^b). ■ 

This property conforms to the analysis of Mooii and Kappen ( 20051 ) and Watanabe and Fukumizu 
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Figure 9: LBP updating function in ()23f) for a > b and a < b. 



l|200d ). We will show the symmetry of fixed-point messages for uniform binary graphs as 
follows. 
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Property 3 For a completely uniform binary graph, synchronous LBP will either converge 
to the unique fixed point ( ' J (paramagnetic fixed point), or converge to one of ( ^ 



x 



/l — x*\ / x * \ fl — x* 

and ( * j w ^ en a > b (ferromagnetic), or oscillate between I ^ J and ( „ 

when a <b (anti- ferromagnetic) . When a > b, x* is the solution to x* = F{x*); otherwise, 
x* is the solution to 1 — x* = F(x*). 

The proof appears in Appendix A. 

From the previous property, we can conclude that completely uniform binary graphs 
will have at most second order periodicity. In other words, F 2n (x) = x 44> F 2 {x) = x and 
F 2n ~ 1 {x) = x^F(x) =x. 

Let us calculate the fixed points and quasi-fixed points for the uniform graph in Fig. EJc) 
with a = 7] and b = 1 - rj. Solving x = r,x *i+^2 ( J)3^ and 1 - x = nx ^^z^^ yields the 
fixed points and quasi- fixed points respectively, for the graph in Fig. EKc). Specifically, we 

can obtain four solutions of fixed points {|, ^, 2+v 2^2+!?^ 2+?? ^2(^2+r;) 8 ' ? 3?? ~} ano - 

four solutions of quasi-fixed points {|, | , 1+n ^l+^T ^ ' 1+ ^ + 2(i+^) ? ^ ^' When ^ > ^/3, 
the graph has two real fixed points except 0.5; when 77 < 1/3, the graph has two real quasi- 
fixed points except 0.5; when 1/3 < rj < 2/3, the graph has one real fixed point 0.5. For 
instance, when 77 = 0.7, we have two stable fixed points (0.9071, 0.0929) and (0.0929, 0.9071); 
when T] = 0.3, we have two quasi-fixed points (0.9071,0.0929) and (0.0929,0.9071). We 
observe that both cases have the same strength of potential function d(ip) 2 = 0.7/0.3, 
though their dynamic characteristics are different. 

Based on Property [3j we find that for completely uniform graphs, the maximum mul- 
tiplicative error and the minimum multiplicative error between two fixed-point messages 
are reciprocal. In other words, d(e(x)) = maxe(x). Therefore, compared to our uniform 
distance bound in Corollary [3j we have a tighter distance bound as follows. 

Corollary 24 (Uniform Distance Bound for Completely Uniform Binary Graph) 

G(V, E) is a completely uniform binary graphical model. The log-distance bound on beliefs 
at node s is 

d{^ts?e + 1 



ter s 



d{ip ts ) 2 + e 



where e should satisfy 



loge= max > log . 

(s,p)eE teTA P d (^ts) 2 + e 



Proof log max E s = log d(E s ) < £ te r s logd(e ts ) < Zter 3 log 1§fJ#pf ■ 

For the uniform graph in Fig. E^c), when r\ = 0.7, we have the true log-distance equal 
to 2.2785, while our previous log-distance bound in Corollary [2j] obtains 2.2785, which is 
exactly equal to the true value, and our Improved-UDB in Corollary 7 obtains 2.3318. 
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True error variation function 

0.04. 1 1 




i i i , 

0.5 1 1.5 2 



log(max E ln ) 



Figure 10: True error variation function when Ms are fixed-point messages for the com- 
pletely uniform graph in Fig. [3jc). a = 0.7, 6 = 0.3. The fixed-point messages 
are: M = (0.8467,0.1533), M = (0.1533,0.8467) and M = (0.5,0.5). 



7.2 True Error- Variation Function 

In this section, we characterize the true error-variation function for a completely uniform 
binary graph. We have the following message updating equation: 

me?"* \ _ 1 (a b\ ( ME\ n 



(1 - m)e° 2 ut J a + b\b a J \{1 - M)Ef 

where M is the product of fixed-point incoming messages, m is the fixed-point outgoing 
message, E in represents the product of incoming errors and e out represents the outgoing 
error. Assuming E in is the same for each node at a level on the Bethe tree, we have the 
following error equation: 



E° ut \ _ (aM + 6(1 - M)) k + (bM + q(l - M)) k 

E?, ut ) ~ (aME[ n + 6(1 - M)Ef) k + (bME[ n + a(l - M)E™) k 



' (aME{ n +b(l-M)Ei, n ) k " 



(aM- 


1-6(1- 


-M)) k 


(bME{ n A 


-a(l- 


-M)E l 2 n ) k 



(bM+a(l-M)) k 

where E^ is the product of outgoing errors flowing into a node at the upper level. 

When E[ n > E™ and a > 6, we have E° ut > E^. Therefore, letting E denote E\ n , we 
obtain the true error variation function: 

G(log(E)) = log max E out - log max £ m 
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] AaME + b{\- ME)) k (aM + 6(1 - M)) k + (bM + a(l - M)) k 

" ° S *■ (aM + b(l-M)) k ' (aME + b{l - ME)) k + (bME + a(l - ME)) k ' ~ l0g ' 

(24) 

when 1 < E < l/M and a > b. 

An example of the true error variation function is illustrated in Fig. [10] for the graph- 
ical model in Fig. [3] (c). The curve of the error variation function G(logE) in Equa- 
tion (|24p varies with the choice of M. The black curve corresponds to G(log E) for 
M = (0.5,0.5), while the blue curve corresponds to G{\ogE) for M = (0.8467,0.1533) 
or M = (0.1533,0.8467). Since GW(oo) = —1, when G(log-E) does not cross the horizontal 
axis except the point at \ogE = 0, we have G(\ogE) < for logE > 0. In other words, 
\ogE will eventually decrease to zero and LBP converges to a unique fixed point. However, 
when G(log-E') crosses the horizontal axis besides log-E = 0, log-E will eventually stay at 
stable points, in which case, the product of the incoming errors at one level of Bethe tree 
equals the product of the incoming errors at its upper level. In other words, errors will not 
decrease after one LBP update. In Fig. [T0l for the black curve, when log E leaves zero, it 
will eventually stay at A. For the blue curve in Fig. [10] when \ogE is between zero and 
the value at point B, it will decrease and finally stay at zero; when logE is bigger than the 
value at point B, it will increase and finally stay at point C. We can see that point B is an 
unstable point. 

From the example in Fig. [TUl we can observe that the zero-crossing points of log E 
correspond to the exact log distances between two fixed-point messages. Specifically, the 
value at point A is equal to the maximal log distance between M = (0.8467,0.1533) and 
M = (0.5,0.5), and the value at point B is equal to the maximal log distance between 
M = (0.5,0.5) and M = (0.1533,0.8467), and the value at point C is equal to the maximal 
log distance between M = (0.8467,0.1533) and M = (0.1533,0.8467). Therefore, our true 
error function in Equation (|24|) characterizes the true distance between fixed points, when 
LBP does not converge. 



8. Conclusion 

In this paper, we presented tighter error bounds on Loopy Belief Propagation (LBP) and 
used these bounds to study the dynamics — error, convergence, accuracy, and scheduling — 
of the sum-product algorithm. Specifically, we derived tight upper- and lower-bounds on 
error propagation in synchronous belief networks. We subsequently relied on these bounds 
to provide uniform and non-uniform distance bounds for the sum-product algorithm. We 
then used the distance bounds to obtain uniform and non-uniform sufficient conditions for 
convergence of the sum-product algorithm. We investigated the relation between conver- 
gence of LBP with sparsity and walk-summability of graphical models. We also showed 
that upper-bounds on message errors can be utilized to determine a priority for scheduling 
in sequential belief propagation. Moreover, we studied the accuracy of the bounds on the 
sum-product algorithm based on our error bounds. We also presented a case study of LBP 
by characterizing the dynamics of the sum-product algorithm for completely uniform graphs 
and analyzed its fixed and quasi-fixed (oscillatory) points. 



29 



Appendix A. Detailed Proofs 
Proof of Theorem [TJ 

Proof We use maximum multiplicative error function as an error measure: 



i+l 

max e ts la;.,) = max 



/ il)ts{x u x s )Mts{xt)E l ts (xt)dx t J ipu(xt)M ts (xt)dx t 



i+li 
ts 1 



x » s x " f ipu{xt)M ts {x t )El s (x t )dx t f ■ipts(x t ,x s )M ts (xt)dx t ' 

where ipt*( x t) = / ipts( x t, x s )dx s . The minimum multiplicative error function min^ e 
is also used as an error measure in this theorem. Some assumptions throughout this 
proof are: ipts(xt,x s ) is positive; message product M ts (xt) and polluted message product 
M ts (xt)El s (xt) are positive and normalized. 



We use the same framework of proof as that in (jlhler et alJ . 120051 . Thm. 8). Let us first 
introduce a lemma that will be used in our proof. 



Lemma 25 For f\ , fi , g\ , g<i all positive, 



fl + h ^ ffl /2-| fl + /2 . . r/l /2-| 

< max[ — , — j, > mm[ — , — J 

91 + 92 91 92 91 + 92 91 92 



Proof The left inequality is proved in llhler et alJ (|2005h . Let us restate it here. As- 
sume without loss of generality that f±/gi > HI 92 so that /i<?2 > fi9\ fi92 + fi9i > 

k9\ + fm - ^ ^ ^ 

fi/gi < 12/92 so that fig 2 < f 2 gi 



g \ > ffi+g2 ' ^ or ^ e r *&^ inequality assume without loss of generality that 

/1 < /1+/2 . 
91 — 91+92 ' 



fi92 + fm < hgi + figi 



Similar to the analysis in (llhler et all 1200.4 Lemma 26), we need the following lemma 
to assist our proof. In the following, we shall omit reference to the iteration number of the 
messages and errors for simplicity and clarity of the presentation. 

Lemma 26 The maximum of max Is et s (x s ) or the minimum of min Xs et s (x s ) is attained 
when 

ipts(x t ,x s ) = 1 + (d(if) ts ) 2 ~ l)Xip(xt), ipu(xt) = 1 + (d(ip t *) 2 ~ l)x*{x t ) 
E ts {x t ) = 1 + {d{E ts f - l) X E{xt), 

where X4>> X* an d Xe are indicator functions. 

Proof Let ip t s(x t ,x s ) = aiipi(x t ,x s ) + 02^2 (xt, x s ), where a\ > 0,a 2 > 0, at + a 2 = 1. In 
other words,vpts(xt, x s ) is a convex combination of two arbitrary positive functions ipi{xt,x s ) 
and ip2(xt,x s ). Thus, by applying Lemma [25| we have: 

cti j ipi(xt,x s )Mt s (xt)E ts (xt)dx t + q 2 / ip2{x t ,x s )M ts (xt)E ts (xt)dx t 



< max[ 



011 f ■ipi(xt,x s )Mts(xt)dx t + a.2 f ip2(xt,x s )Mts(xt)dxt 
j ipi(x t ,x s )M ts (x t )E ts (xt)dx t J ifj2(xt,x s )M ts (xt)E ts (xt)dxt- 



f ip 1 (x t ,x s )M ts (xt)dxt 



f ip2(x t ,x s )M ts (xt)dx t 



30 



We find that max Is et s (x s ) is maximized when we take the maximum of the RHS expression 
in the previous inequality. Let us scale ^t s (xt,x s ) so that the minimal value of the function 
is 1. Thus, ipts(xt,x s ) can be composed by a convex combination of functions which have 
the form 1 + (d(ipts) 2 — l)x*l>( x t), where Xip( x t) is an indicator function. We can find that 
the max Xs et s (x s ) is maximized when Tp ts (xt,x s ) = 1 + (d(ipts) 2 — l)Xip{ x t)- Similar are the 
proofs for ipu(xt) and E ts (x t ). 

To minimize the min Xs et s (x s ), by applying Lemma l25| we have: 

a% J ipi(xt,x s )Mts(xt)Ets(xt)dx t + a 2 / th(xt , x s )M ts (x t )E ts (x t )dx t 
a\ J ipi(x t , x s )M ts (x t )dx t + a 2 J ip 2 (x t ,x s )M ts (x t )dx t 
f ipi (x t , x s )M ts {x t )E ts (x t )dx t f ip2{xt,x s )M ts {xt)E ts (xt)dxt^ 



> min[ 



Jip 1 (x t ,x s )M ts (xt}dxt ' f ip 2 (xt,x s )M ts (xt)dx t 



Furthermore, by constructing the potential function i/jts( x t,x s ) as a convex combination of 
functions of the form 1 + (d(ipts) 2 — l)X'<p{ x t), where X4>i x t) is an indicator function, we can 
find that mm Xg et s {x s ) is minimized when ipts(xt, x s )) is one of these functions. Similar are 
the proofs for ip t *{xt) and E ts (xt). ■ 

So we have max Is et s (x s ) is bounded by 

/ ip ts (xt, x s )M ts (xt)E ts (xt)dx t f ip u {x t )M ts (x t )dx t 

X 



x 



/ ipu{x t )M ts (xt)E ts (xt)dxt f ipts(xt,x s )M ts (xt)dx t 
f(l + (d(j; ts ) 2 - l) X ip(xt))M ts (x t ){l + (d(E ts ) 2 - l)x E (xt))dxt 
f(l + (d(i/j t *) 2 ~ i)x*(xt))M u (x t )(l + {d{E ts f - l) XE (xt))dxt 
/(l + Wi*) 2 " l)x*{xt))M ts {x t )dx t 



f(l + (d(7p ts y - l)x*(xt))Mts(xt)dx t ' 
Define the quantities: 



M A = / M ts (x t )xip{xt)dx t , M B = / M ts (x t )x*{xt)dx t , M E = / M ts (x t )xE(xt)dx t , 



M A e = J M ts (x t )x^(xt)xE{xt)dx t , M B e = J M ts (x t )x*{xt)XE(xt)dx t , 
ai = d(^ ts ) 2 - 1, a 2 = d{ii u ) 2 - 1, p = d(E ts ) 2 - 1. 
The maximum multiplicative error max Is et s (x s ) is upper-bounded by Ax where 

1 + aiM A + (3M E + axpM AE 1 + a 2 M B 

Ai = max . 

M 1 + a 2 M B + f3M E + a 2 jiM BE 1 + a x M A 

The maximum is obtained when M AE = M A = M E = 1 — M B and M BE = 0, which gives 

1 + (ai + P + a 1 f3)M E 1 + a 2 - a 2 M E 
m e l + a 2 + ((3-a 2 )M E 1 + a x M E 

Taking the derivative wrt M E and setting it to zero, we obtain 



maxe ts {x s ) < Ai 



/ d(^ ts )d{jj u )d(E ts ) + 1 
V d(ijHs)d(fM + d(E ts ) 
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Similarly to what we have done so far, we can lower-bound mm Xg et s (x s ) with respect 
to tp ts (xt,x s ), -il>t*(xt) and E ts (x t ), to obtain 

. / d(ijHs)d(iM + d(Eu) V 1 
mme ts {x s ) > I — — . — . 



\d(^ ts )d(^)d(E ts ) + lJ Ai 



Proof of Corollary [3] 

a 



Proof Let A ut (x) = {^&^^)\x > l,ut G E. Therefore, 



«)< n w= n =#=<4= n ^o^ 1 ))- 



Thus, we have 



maxSl+ 1 (^)< II maxe|+ 1 ( ; r s )<4+ 1 = A ts (d(ED) 
ter s \ P s ter s \ P 

< J] A ts (4)< [] A ts ( max 4) = A 3 ( max 4). 

ter s \ P *er s \p 

The term e*^ 1 is an upper-bound on the incoming error product E^ (x s ) at iteration 
i + 1, while m&x te r s \ p e l ts is the maximum of the upper-bounds on the incoming error 
products {E\ s {x t ),t G r s \p} at iteration i. We hope to achieve that e^ 1 < max 4grs \ p e\ s . 
Denoting e = max ie r s \ p e\ s , let us introduce an error bound-variation function: 

G sp (\oge) = log A 3 (e) - loge > loge^ 1 - log max e\ a ,e > 1, 

ter s \p 

which describes variation of error bound after each iteration. When G sp (loge) = 0, the log- 
distance bound log e will reach a fixed point, which is the maximal distance between message 
products at various iterations. Because G^p (loge) < for loge > and G^p (oo) = —1/2, 
Gip (loge) will decrease until it is equal to —1/2. Therefore, it only has one crossing point 
besides loge = (zero crossing point). This nonzero crossing point is a stable fixed point of 
function G sp (loge). In other words, once loge leaves the zero crossing point, it will stay at 
this stable crossing point, loge*, which corresponds to the upper bound on error products. 
Because the distance between fixed points of B s {x s ) is 

\ogE s {x s ) = log Y[ ets{x s ) < log Y[ A ts (e*), 
ter s t€T s 

we can obtain the log-distance bound on B s (x s ) by taking the maximum e*. ■ 
Proof of Theorem |4] 

Proof Let us revisit the error bound-variation function in Equation fjl4j) : 

G sp loge) = log II ( — ) - loge, 

I 1 d{ip ts )d{^t*) + e 
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which describes the variation of the error bound after each iteration. To guarantee that 
LBP converges, it is sufficient to require G sp (loge) < 0, Vloge > 0. Let z = loge. The 
second derivative of G sp (z) is 

r (2)( x = 2 x V d (Md(rJ>t*)e z mMdWt*)) 2 - W- - e 2z ) 
sp{) (d(^K^)e* + l)a(d(^)d(^) + e*)2 " ' 

when d(ijjts)d(ipt*) > 1 and z > 0. When z > 0, G sp (z) is strictly concave. 
The first derivation of G sp (z) is 

r (D (z) = 2x v e-((rf(y^)rf(^)) 2 -i) 

spM Jr< (^)<iWe , + l)(WiW + e') ' 

t£F s \p 

Because G sp (z = 0) = 0, if the first derivative G#p(z = 0) < 0, we will have G sp (z > 0) < 0. 
Therefore, 

G«f0) = 2x V ( W*)<*ftM) 2 - 1) 

sp{ > Jr{ (d^ ts )d(^) + l)(d^ts)d(^) + l) 

ter s \p 

d{^ ts )d{ipu) ~ 1 1 
t Jr^ d(tp ts )d(rp u ) + 1 2' 

■ 

Proof of Theorem [9] 

Proof Recall that in the proof of uniform convergence condition, we use an error bound- 
variation function G sp (loge), which is originally to describe (loge*^ 1 — log£j s ), for V(s,p) G 
E. For each T(G,i>), given vu £ E, let us introduce the following error bound- variation 
function: 

G vu ({loge WiV },loge) = E Wi er v \u lo § ^Sfj^t ~ lo g g ' 

_ „ . rf(V'i(^ ; ,-)iK)) 2£ "'j"'i+ 1 

where {io r } is the set of leaf nodes of T(G,v). 

To guarantee LBP to converge, it is sufficient to have G^ M (loge) < for Vloge > 0. 
Because G vu (loge = 0) = 0, when G^ u (loge = 0) < 0, we will definitely have G vu (0 < 
loge < 5) < 0, where 5 is a small positive value. When G vu (loge) is concave, 5 can be 
infinity so that the convergence of LBP is true for Vloge > 0. However, because G vu (loge) 
is not guaranteed to be concave, we will only obtain local convergence for an infinitesimal 
5. 

Define f WjWi {e WjWi ) = log mI^^^I^ — • Thus ' we have the first derivative of 
G vu ({log e WiV }, loge) as follows: 
5G„ u ({log e mv },loge) 



Q\ g £ jL^i Jw i v L^i JWjWi— Jw r w q 

Wi&T v \u Wj&F Wi \v w r eT mq \w p 
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where /' = 9f d ( \°^ ) = wWe) ' Plu SS in g loge = into the previous equation, we 

obtain our non-uniform convergence condition. ■ 

Proof of Property [3] 

Proof Let us analyze the fixed points by solving the set of equations 

y = F{x) (25a) 
x = F(y) (25b) 

which corresponds to second order periodicity x = F 2 (x). The set of equations is depicted 
in Fig. [9] for a > b and a < b respectively. We can easily find that F(x) and F{y) are 
symmetric with respect to y = x. Moreover, because F{x) is symmetric about the point 
(0.5,0.5), we have F(l — x) = 1 — F(x). Therefore, it is easy to see that F(x) and F(y) 
are also symmetric with respect to y = 1 — x. Let us check whether the two functions are 
symmetric with respect to other lines such as y = (3 + ax. Substitute y = j3 + ax and 

X = a ^ ~ ^ 111 P^l>- We haVe Z 3 + aX = ( a +^((y-^+^-(y-l)) k ) ' FOT tMs eC L Uation to be 

always equivalent to (|25bp . we have (a = l,/3 = 0) or (a = — 1,/3 = 1). Thus, the set of 
equations is only symmetric with respect to y = x and y = 1 — x. 

When y = F(x) and x = F(y) intersect, they must have crossing points on y = x 
or y = 1 — x. In the following, we will show that they do not cross elsewhere. When 
a > b, let us assume these two functions have one crossing point A not on y = x and 
y = 1 — x, which is illustrated in Fig. [9] (a). Due to the symmetry between F(x) and 
F(y), they must have the other three crossing points B,C and D shown in Fig. [9] (a) 
respectively. Both functions must go through those points. The first derivative of F(x) 

F^(x) = k j ,a ~Su- l — ^\k~ x \\i = I ^ !!' a ^ ? i which shows that function F(x) is either 
v ' {a+b)({i-x) k +x k ) 2 \ <0,a<b v ; 

monotonic increasing or monotonic decreasing. Because ys < yA, when xb > xa, we 

arrive at a contradiction with the monotonic increasing property under the condition a > b. 

Similar result is for a < b. According to Property [21 y = F(x) and x = F(y) have at most 

three real crossings points with an arbitrary line. Therefore, we can see that the set of 

equations will have at most three crossing points with either y = xoiy = l — x. 

The set of equations in (125aj) and (|25b|) has a naive fixed point (0.5,0.5). However, it is 

only stable when the set of equations crosses nowhere else on y = x and y = 1 — x. When 

a > b and F^\^) = > 1, we can see that the belief network will either converge 

at fixed point E or at fixed point F on y = x in Figj9] (a). In this case, the fixed point 

at x = 0.5 is an unstable point. When a < b and F^(^) < —1, the belief network will 

eventually oscillate between E and F on y = 1 — x, which is shown in Fig. [9] (b). The fixed 

point at x = 0.5 is again an unstable fixed point. Because F(x) is symmetric with respect 

to (x = 0.5, y = 0.5), points E and F are symmetric with respect to (x = 0.5, y = 0.5). ■ 
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